Tue Sep 27 20:08:19 PDT 2005
- Previous message: [Slony1-general] Slony Falling Behind
- Next message: [Slony1-general] Slony Falling Behind
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Joe Markwardt wrote: >Hi all, > I am having some problems trying to get slony to catch up after a >weekend of heavy updates. I've tried playing with the -s and -g settings >for slony, but that hasn't seemed to have any effect on the situation. >I'm not sure where to proceed from here, so heres where I say "uncle" >and ask for help from those wiser than I :-) I'm not sure what info you >need so I'll just have at it ( If I leave anything out let me know and >i'll get it for you): > >setup: replicate 3 databases from node 1 to node 2 > Each database is replicated using a separate slony config file (using >the slon-tools) > > node 1: provider > PG 7.4.7 > Slony 1.1.0 > > node 2: subsciber > PG 8.0.3 > Slony 1.1.0 > >Problem: > node 2 is ~ 3 days behind node 1 after a series of heavy updates on the >weekend and falling further behind. > >Symtops: > the FETCH statement on the provider is eating 100% of one of the CPU's >on the server, not touching the disks at all. The subscriber spends 99% >of its time Idle in Transaction (waiting for the fetch to finish?) > >Details: > >from the node2 slony log: > >2005-09-27 10:55:28 CDT DEBUG2 remoteListenThread_1: queue event >1,1691509 SYNC >2005-09-27 10:55:28 CDT DEBUG2 remoteHelperThread_1_1: 120.122 seconds >delay for first row >2005-09-27 10:55:28 CDT DEBUG2 remoteHelperThread_1_1: 120.122 seconds >until close cursor >2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1: new sl_rowid_seq >value: 1000000000000000 >2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1: SYNC 1610537 done >in 120.170 seconds >2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1: Received event >1,1610538 SYNC >2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1: SYNC 1610538 >processing >2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1: syncing set 1 with >114 table(s) from provider 1 >2005-09-27 10:55:30 CDT DEBUG2 syncThread: new sl_action_seq 1 - SYNC >635243 > >from sl_status: >select * from _pl_replication.sl_status ; > st_origin | st_received | st_last_event | st_last_event_ts | >st_last_received | st_last_received_ts | >st_last_received_event_ts | st_lag_num_events | st_lag_time >-----------+-------------+---------------+----------------------------+------------------+----------------------------+----------------------------+-------------------+----------------------- > 1 | 2 | 1691576 | 2005-09-27 10:58:05.907383 | >1610538 | 2005-09-27 10:55:44.046123 | 2005-09-24 14:02:55.407712 | >81038 | 2 days 20:55:12.07423 > >from sl_log_1: > select count(*) from _pl_replication.sl_log_1; > count >---------- > 14427361 > >Any ideas or suggestions on how to reverse this process so it will start >catching up again? > > > We have just been able to successfully point at a Probably Really Good answer to this; it just got checked into CVS HEAD and 1_1_STABLE... Let me suggest that you create an extra index: create index sl_log_1_idx2 on _pl_replication.sl_log_1(log_xid _pl_replication.xxid_ops); This needs to be created on any provider node; it won't be particularly valuable on nodes that only subscribe. It should MASSIVELY increase the speed of the queries; you will more than likely see the "120.122 seconds delay for first row" message diminish to just a few seconds. In the short term, you might also want to consider two other things: a) Set the "-s" option somewhat higher until things catch up; default is 10000 ms; "-s60000" would do one SYNC per minute, which might be useful... b) It looks like you're processing just one SYNC at a time; make sure the "-g" option is set to some value; we often run with "-g40" to group 40 SYNCs together at a time...
- Previous message: [Slony1-general] Slony Falling Behind
- Next message: [Slony1-general] Slony Falling Behind
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list