[Slony1-general] Soliciting ideas for v2.0

Mon Jul 2 20:15:38 PDT 2007

On 7/2/2007 3:53 PM, Christopher Browne wrote:
> Jan Wieck <JanWieck at Yahoo.com> writes:
>> On 7/2/2007 2:03 PM, Jan Wieck wrote:
>>> On 7/2/2007 1:45 PM, Marko Kreen wrote:
>>>> On 7/2/07, Jan Wieck <JanWieck at yahoo.com> wrote:
>>>>> The stuff I am currently (very slowly) working on is that very problem.
>>>>> Any long running transaction causes that the minxid in the SYNC's is
>>>>> stuck at that very xid during the entire runtime of the LRT. The problem
>>>>> with that is that the log selection in the slon worker uses an index
>>>>> scan who's only index scankey candidates are the minxid of one and the
>>>>> maxxid of another snapshot. That is the range of rows returned by the
>>>>> scan itself. Since the minxid is stuck, it will select larger and larger
>>>>> groups of log tuples only to filter out most of them on a higher level
>>>>> in the query via xxid_le_snapshot().
>>>> How the LRT problem is avoided in PGQ:
>>>> http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/skytools/skytools/sql/pgq/functions/pgq.batch_event_sql.sql?rev=1.2&content-type=text/x-cvsweb-markup
>>>> Basic idea is that there are only few LRT's, so its reasonable
>>>> to pick up bottom half of range by event txid, one-by-one.
>>> Hmmm, that is an interesting idea. And it is (in contrast to what
>>> I've been playing with) node insensitive, since it doesn't need info
>>> only available on the event origin, like CLOG. Thanks.
>>
>> Not only is it interesting, but it is astonishing simple to adopt into
>> our code. I want to do some more testing before I commit this change,
>> but the really interesting thing here is that it is only a 3 line
>> change in the remote_worker.c file, which could easily be backported
>> into 1.2.
>>
>> I had created a really pathetic test case here by SIGSTOP'ing the slon
>> while doing the copy_set() for a day, so it had some 90000 events
>> backlog. About a third into that backlog, it was down to 60+ seconds
>> delay for first log row and due to the dynamic in the group size,
>> doing that on a single event base. That same database is now moving
>> through the backlog in batches of 5-8 minutes each, has a <1 second
>> delay for first log row and does those groups in 50-70 seconds.
>>
>> This looks very promising.
> 
> Drew Hammond's keen on having some BSD-oriented scripts put into the
> 1.2 branch that I had only put into HEAD; this might be an excuse for
> a 1.2.11.

IF ... if ... (really if) ... this all checks out to do what it is 
supposed to do. We have to test if this sort of change does have any 
adverse side effects on slony installations running with hundreds of 
concurrent DB connections, for example. So far it only looks pretty 
against a simple N1->N2 setup bombarded with a -c5 pgbench. That isn't 
quite the testing you want to have done before committing such a 
substantial change in the inner core log selection logic of STABLE code, 
is it?

Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #