[Slony1-general] Soliciting ideas for v2.0

Wed Jul 4 12:36:35 PDT 2007

On July 4, 2007 11:52 am, Jan Wieck wrote:
> On 7/4/2007 10:52 AM, Christopher Browne wrote:
> > Jan Wieck <JanWieck at Yahoo.com> writes:
> >> On 7/3/2007 12:33 PM, Christopher Browne wrote:
> >>> "Andrew Hammond" <andrew.george.hammond at gmail.com> writes:
> >>>> Also, ISTM that the big reason we don't like statement based
> >>>> replication is that SQL has many non-deterministic aspects. However,
> >>>> there is probably a pretty darn big subset of SQL which is provably
> >>>> non-deterministic. And for that subset, would it be any less
> >>>> rigorous to transmit those statements than to transmit the per-row
> >>>> change statments like we currently do?
> >>>
> >>> Well, by capturing the values, we have captured a deterministic form
> >>> of the update.
> >>
> >> How to figure out what is deterministic and what isn't? A simple
> >>
> >>     insert into summary select id, sum(value) from detail group by id;
> >>
> >> seems pretty deterministic, doesn't it? But the result of it depends
> >> on the exact commit order and the transaction isolation level. We
> >> don't capture the commit order of single transactions, nor do we care
> >> for it anywhere in the Slony-I logic.
> >
> > But at the time that we apply these changes in log_actionseq order, we
> > have imposed a deterministic order.  (Which happens to be repeatable,
> > on each node.)
>
> The question was, how do we figure out which SQL statement would be
> deterministic and thus a candidate for SQL query string propagation -
> aside from the fact that there is no standard way in Postgres to capture
> query strings or parsetrees anyway. So far we have only established that
> the logging of the changes has to be done the way we are doing it now,
> on a row base where the actionseq determines the repeatable order. I
> don't see how going through a lot of effort to group those individual
> log rows together again will gain us a lot.
>
> Unless the effort also attempts to group together consecutive insert and
> update statements affecting the same columns and using prepared
> statements, and unless we have some evidence that doing so will gain
> more than the effort of all that grouping costs, I don't think it is a
> good idea to make that part of remote_worker.c any more complicated than
> it is today. How many developers do we have who actually understand how
> that part of slon really works and who could go in and fix some bug in
> there?

I have a better understanding of how this works today than i did a month ago, 
but it still feels a lot like black magic in there so I'm with Jan on this 
one, unless we can show that there is a significant advantage to doing so 
it's not worth the complication. 

Effort spent keeping slony from suffering from ill effects of Long Running 
Transactions feels like a much better basket to place development eggs into 
to me.

>
>
> Jan

-- 
Darcy Buskermolen
Command Prompt, Inc.
+1.503.667.4564 X 102
http://www.commandprompt.com/
PostgreSQL solutions since 1997