[Slony1-general] size of requests stored in sl_log_x

Sun Sep 16 17:04:06 PDT 2007

On 9/16/2007 10:06 AM, Cyril SCETBON wrote:
> 
> Jan Wieck wrote:
>> On 9/16/2007 4:27 AM, Cyril SCETBON wrote:
>>>
>>> Filip Rembiałkowski wrote:
>>>> Guys, this discussion went quite offtopic, but maybe you missed one
>>>> fact that Cyril may no know about.
>>>>
>>>> In postgres, setting N columns to NULL is just N bits of physical 
>>>> writes.
>>>>
>>>> So the overhead of
>>>>
>>>> INSERT INTO t1 ( id, data1, data2, data3, ..., data100 )
>>>> VALUES( 12345, 'the only non-null data', NULL,NULL, ..., NULL )
>>>>
>>>> is not so terrible.
>>>>   
>>> Thanks filip, but I didn't talk about the performance when applying 
>>> this request, but the fact that storing a longer request than a 
>>> simple insert into t1(col1,col2) values(valcol1,valcol2) causes slony 
>>> tables to grow faster, and needs more network bandwidth, that's all :-)
>>
>> And we can't do that because imagine you have a schema
>>
>>     create table t1 (
>>         a int primary key,
>>         b text default 'foo'
>>     );
>>
>> and then do
>>
>>     insert into t1 (a, b) values (1, NULL);
>>
>> the resulting row, that make it into the masters table, will be
>>
>>     (1, NULL)
>>
>> Yeah, another one of those little things where MySQL behaves different 
>> and where Postgres is right according to ANSI. If we omit that NULL 
>> column now from the INSERT on the subscriber, the default will be used 
>> there, resulting in
>>
>>     (1, 'foo')
>>
>> and we have the subscriber out of sync.
>>
>>
>> Jan
>>
> The suggestion was to catch the request entered so here (a,b) (1, NULL) 
> would be stored. but if user enter insert into t1(b) values('b') , then 
> (b) ('b') would be stored, and as every subscribe has the same schema 
> the default value would be used on them. 

And what exactly do you capture as "request entered" if the original 
query was "insert into foo (a, b, c) select x, y, z from temp_bar;" ?

There certainly is some optimization possible. No doubt. But it isn't 
achievable as cheap as you think. One would first have to agree that all 
nodes in the whole cluster have to have tables containing the same 
columns. Then, the column names can be mapped to short integers in a new 
slony configuration table. Finally, the sl_log tables query field has to 
be split up into a short integer array specifying the table column order 
in the second field, containing all the values.

The question is, are you willing to develop such a patch, or at least 
willing to sponsor development of such a patch?

Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #