[Slony1-general] size of requests stored in sl_log_x

Mon Sep 10 15:29:52 PDT 2007

On 9/10/2007 4:01 PM, Cyril SCETBON wrote:
> 
> Jan Wieck wrote:
>> On 9/10/2007 12:49 PM, Cyril SCETBON wrote:
>>>
>>> Christopher Browne wrote:
>>>> Cyril SCETBON wrote:
>>>>>
>>>>>
>>>>> Jan Wieck wrote:
>>>>>> On 9/10/2007 10:07 AM, Cyril SCETBON wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We're hitting 400 requests per second and noticed that sl_log_? 
>>>>>>> are growing a lot. This is also due to the fact that our tables 
>>>>>>> have a lot of columns (more that 100). We noticed that slony 
>>>>>>> stores each column and its value for each request in sl_log_x. 
>>>>>>> Would it not be more optimized to store just the user request ? 
>>>>>>> Actually, a user is just updating a few column, or inserting a 
>>>>>>> tuple by giving some attributes but not all, however as slony 
>>>>>>> stores every column with its value it sl_log_x, these tables are 
>>>>>>> growing very fast :-(
>>>>>>
>>>>>> Slony only logs columns where the value actually has changed. 
>>>>>> Please explain in detail how you propose that the log table would 
>>>>>> contain which columns have changed without naming them.
>>>>> when I execute this command on the master :
>>>>>
>>>>> insert into t1(ise) values('cyril100001');
>>>>>
>>>>> I can see in sl_log_1 :
>>>>>
>>>>> psql>select log_cmddata from sl_log_1;
>>>>> (ise,id,id2) values ('cyril100001','24161',NULL)
>>>>>
>>>>> dbtest=# \d t1;
>>>>>                                Table "public.t1"
>>>>> Column |         Type          |                    
>>>>> Modifiers                   
>>>>> --------+-----------------------+------------------------------------------------- 
>>>>>
>>>>> ise    | character varying(54) | not null
>>>>> id     | integer               | not null default 
>>>>> nextval('t1_id_seq'::regclass)
>>>>> id2    | integer               |
>>>>> Indexes:
>>>>>    "t1_pkey" PRIMARY KEY, btree (ise)
>>>>> Triggers:
>>>>>    "_CLUSTER1_logtrigger_1" AFTER INSERT OR DELETE OR UPDATE ON t1 
>>>>> FOR EACH ROW EXECUTE PROCEDURE "_CLUSTER1".logtrigger('_CLUSTER1', 
>>>>> '1', 'kvv')
>>>>>
>>>>> I agree for id which is an auto_increment but not for id2 :-(
>>>>>
>>>>> In another plateform my table has more than  100 columns as said 
>>>>> earlier and that's really matter :-(
>>>> Jan's request seems to remain perfectly good...
>>>>
>>>> "Please explain in detail how you propose that the log table would 
>>>> contain which columns have changed without naming them."
>>>>
>>>> It is *not* obvious how to avoid naming all of the columns.  It is 
>>>> not safe to simply assume "that column was null so we may omit it" - 
>>>> different nodes may be configured differently, and it may well be 
>>>> important to actually have that NULL value.
>>> I was just thinking of using the same way as other statement based 
>>> replication way like does mysql, that is to say just log the request 
>>> that user did. However, I forgot that slony uses triggers to log 
>>> requests :-( Maybe finding a way to know which current request is 
>>> done on the table for which the trigger has been called would be 
>>> possible ?
>>
>> Turning Slony into a statement based replication system would make it 
>> far worse. 
> You may be right. It was just a suggestion to do something to replace 
> the long SQL string by a shorter one to gain space, and so performance 
> when getting blocks from disks with more rows :-)
>> Queries depend on the exact transaction commit order with respect to 
>> the exact rows that are visible to them. That in addition to the 
>> non-deterministic behavior of sequential scans and a myriad of 
>> functions ... how do you expect some query like
>>
>>     update mytable set foo = nextval('someseq') where foo is null;
>>
>> to lead to exactly the same result on two different databases? Not 
>> even the order in which the rows are processed is defined.
> If MySQL support it , why slony would not ? 
> (http://dev.mysql.com/doc/refman/5.1/en/replication-features-autoincid.html), 
> but you right for deterministic functions it's hard if value is added in 
> the statement. But it's not what I've suggested.

What does MySQL's autoincrement feature have to do with the above UPDATE 
query?

>> Compared to Postgres, MySQL is a fairly limited system feature wise. 
>> Yet with their limited functionality the MySQL user manual already 
>> states that several features are not supported by their statement 
>> based replication. What do you think the corresponding documentation 
>> for Slony would look like?
> Sorry, but you can compare it to the log shipping feature of slon with 
> less lag time.

In what way are these two even remotely similar?

Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #