Jan Wieck JanWieck at Yahoo.com
Wed Jul 4 11:52:14 PDT 2007
On 7/4/2007 10:52 AM, Christopher Browne wrote:
> Jan Wieck <JanWieck at Yahoo.com> writes:
>> On 7/3/2007 12:33 PM, Christopher Browne wrote:
>>> "Andrew Hammond" <andrew.george.hammond at gmail.com> writes:
>>
>>>> Also, ISTM that the big reason we don't like statement based
>>>> replication is that SQL has many non-deterministic aspects. However,
>>>> there is probably a pretty darn big subset of SQL which is provably
>>>> non-deterministic. And for that subset, would it be any less
>>>> rigorous to transmit those statements than to transmit the per-row
>>>> change statments like we currently do?
>>> Well, by capturing the values, we have captured a deterministic form
>>> of the update.
>>
>> How to figure out what is deterministic and what isn't? A simple
>>
>>     insert into summary select id, sum(value) from detail group by id;
>>
>> seems pretty deterministic, doesn't it? But the result of it depends
>> on the exact commit order and the transaction isolation level. We
>> don't capture the commit order of single transactions, nor do we care
>> for it anywhere in the Slony-I logic.
> 
> But at the time that we apply these changes in log_actionseq order, we
> have imposed a deterministic order.  (Which happens to be repeatable,
> on each node.)

The question was, how do we figure out which SQL statement would be 
deterministic and thus a candidate for SQL query string propagation - 
aside from the fact that there is no standard way in Postgres to capture 
query strings or parsetrees anyway. So far we have only established that 
the logging of the changes has to be done the way we are doing it now, 
on a row base where the actionseq determines the repeatable order. I 
don't see how going through a lot of effort to group those individual 
log rows together again will gain us a lot.

Unless the effort also attempts to group together consecutive insert and 
update statements affecting the same columns and using prepared 
statements, and unless we have some evidence that doing so will gain 
more than the effort of all that grouping costs, I don't think it is a 
good idea to make that part of remote_worker.c any more complicated than 
it is today. How many developers do we have who actually understand how 
that part of slon really works and who could go in and fix some bug in 
there?


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #


More information about the Slony1-general mailing list