[Slony1-general] Slony & High Volume Updates

Thu Nov 15 07:42:38 PST 2007

Hey folks, I'm running slony 1.2.9 and in general it is working  
fantastic (THANKS!!)

I'm noticing in my pgfouine reports and by watching the db often the  
slony queries are taking up a lot of time.  This may be due to the  
high update volume of some of the replicated tables (One table in  
paticular gets about 6k-8k updates every 5 minutes. I suppose you  
could say 1k updates a minute).

I've run some of the queries that show up in explain, which seem to  
be coming out sane.  I'm just wondering if there is something I  
should be looking into about speeding it up, or if I should  
investigate something like log shipping.  (I've got a master and 3  
slaves.  2 of the slaves can lag a bit, 1 of the slaves should be  
about <=1s )

Examples (times in seconds):

3,556.72s
DELETE FROM "_replication".sl_log_1 WHERE log_origin = '2' AND  
log_xid < '4074411134'; DELETE FROM "_replication".sl_log_2 WHERE  
log_origin = '2' AND log_xid < '4074411134'; DELETE FROM  
"_replication".sl_seqlog WHERE seql_origin = '2' AND seql_ev_seqno <  
'1296696'; SELECT "_replication".logswitch_finish();

57.73s | notify "_replication_Event"; notify "_replication_Confirm";  
INSERT INTO "_replication".sl_event (ev_origin, ev_seqno,  
ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type ) VALUES ('4',  
'2049054', '2007-11-14 04:50:10.063141', '59155951', '59156428',  
'''59156426'',''59155951''', 'SYNC'); INSERT INTO  
"_replication".sl_confirm (con_origin, con_received, con_seqno,  
con_timestamp) VALUES (4, 3, '2049054', now()); COMMIT transaction;
57.72s | notify "_replication_Event"; notify "_replication_Confirm";  
INSERT INTO "_replication".sl_event (ev_origin, ev_seqno,  
ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type ) VALUES ('4',  
'2049054', '2007-11-14 04:50:10.063141', '59155951', '59156428',  
'''59156426'',''59155951''', 'SYNC'); INSERT INTO  
"_replication".sl_confirm (con_origin, con_received, con_seqno,  
con_timestamp) VALUES (4, 1, '2049054', now()); COMMIT transaction;
34.33s | notify "_replication_Event"; notify "_replication_Confirm";  
INSERT INTO "_replication".sl_event (ev_origin, ev_seqno,  
ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type ) VALUES ('4',  
'2040322', '2007-11-14 02:21:41.311908', '58935882', '58935883', '',  
'SYNC'); INSERT INTO "_replication".sl_confirm (con_origin,  
con_received, con_seqno, con_timestamp) VALUES (4, 3, '2040322', now 
()); COMMIT transaction;

105.57s | fetch 100 FROM LOG;
102.80s | fetch 100 FROM LOG;
80.58s | fetch 100 FROM LOG;

I'm going to turn on duration logging for all queries tonight to make  
sure I get a more accurate picture of where time is being spent.    
But like I said, I suppose this could be the nature of the beast with  
that crazy update table.

Should I look into perhaps seeing if those slow notify... queries  
were trying to run at the same time as that log switch?

thanks!

--
Jeff Trout <jeff at jefftrout.com>
http://www.dellsmartexitin.com/
http://www.stuarthamm.net/