Tue Sep 27 22:37:42 PDT 2005
- Previous message: [Slony1-general] Slony Falling Behind
- Next message: [Slony1-general] Slony Falling Behind
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 2005-09-27 at 17:30 -0400, Jan Wieck wrote: > On 9/27/2005 4:16 PM, Joe Markwardt wrote: > > > On Tue, 2005-09-27 at 15:38 -0400, Christopher Browne wrote: > >> Jan Wieck wrote: > >> > >> > On 9/27/2005 3:08 PM, Christopher Browne wrote: > >> > > >> >> Joe Markwardt wrote: > >> >> > >> >>> Hi all, > >> >>> I am having some problems trying to get slony to catch up > > after a > >> >>> weekend of heavy updates. I've tried playing with the -s and -g > >> >>> settings > >> >>> for slony, but that hasn't seemed to have any effect on the > > situation. > >> >>> I'm not sure where to proceed from here, so heres where I say > > "uncle" > >> >>> and ask for help from those wiser than I :-) I'm not sure what > > info > >> >>> you > >> >>> need so I'll just have at it ( If I leave anything out let me know > > and > >> >>> i'll get it for you): > >> >>> > >> >>> setup: replicate 3 databases from node 1 to node 2 > >> >>> Each database is replicated using a separate slony config file > >> >>> (using > >> >>> the slon-tools) > >> >>> > >> >>> node 1: provider > >> >>> PG 7.4.7 > >> >>> Slony 1.1.0 > >> >>> > >> >>> node 2: subsciber > >> >>> PG 8.0.3 > >> >>> Slony 1.1.0 > >> >>> > >> >>> Problem: > >> >>> node 2 is ~ 3 days behind node 1 after a series of heavy > > updates > >> >>> on the > >> >>> weekend and falling further behind. > >> >>> > >> >>> Symtops: > >> >>> the FETCH statement on the provider is eating 100% of one of > > the > >> >>> CPU's > >> >>> on the server, not touching the disks at all. The subscriber > > spends > >> >>> 99% > >> >>> of its time Idle in Transaction (waiting for the fetch to finish?) > >> >>> > >> >>> Details: > >> >>> > >> >>> from the node2 slony log: > >> >>> > >> >>> 2005-09-27 10:55:28 CDT DEBUG2 remoteListenThread_1: queue event > >> >>> 1,1691509 SYNC > >> >>> 2005-09-27 10:55:28 CDT DEBUG2 remoteHelperThread_1_1: 120.122 > > seconds > >> >>> delay for first row > >> >>> 2005-09-27 10:55:28 CDT DEBUG2 remoteHelperThread_1_1: 120.122 > > seconds > >> >>> until close cursor > >> >>> 2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1: new > > sl_rowid_seq > >> >>> value: 1000000000000000 > >> >>> 2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1: SYNC 1610537 > > done > >> >>> in 120.170 seconds > >> >>> 2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1: Received > > event > >> >>> 1,1610538 SYNC > >> >>> 2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1: SYNC 1610538 > >> >>> processing > >> >>> 2005-09-27 10:55:28 CDT DEBUG2 remoteWorkerThread_1: syncing set 1 > > with > >> >>> 114 table(s) from provider 1 > >> >>> 2005-09-27 10:55:30 CDT DEBUG2 syncThread: new sl_action_seq 1 - > > SYNC > >> >>> 635243 > >> >>> > >> >>> from sl_status: > >> >>> select * from _pl_replication.sl_status ; > >> >>> st_origin | st_received | st_last_event | st_last_event_ts > > | > >> >>> st_last_received | st_last_received_ts | > >> >>> st_last_received_event_ts | st_lag_num_events | st_lag_time > >> >>> > > -----------+-------------+---------------+----------------------------+------------------+----------------------------+----------------------------+-------------------+----------------------- > >> >>> > >> >>> 1 | 2 | 1691576 | 2005-09-27 > > 10:58:05.907383 | > >> >>> 1610538 | 2005-09-27 10:55:44.046123 | 2005-09-24 14:02:55.407712 > > | > >> >>> 81038 | 2 days 20:55:12.07423 > >> >>> > >> >>> from sl_log_1: > >> >>> select count(*) from _pl_replication.sl_log_1; > >> >>> count > >> >>> ---------- > >> >>> 14427361 > >> >>> > >> >>> Any ideas or suggestions on how to reverse this process so it will > >> >>> start > >> >>> catching up again? > >> >>> > >> >>> > >> >>> > >> >> > >> >> We have just been able to successfully point at a Probably Really > > Good > >> >> answer to this; it just got checked into CVS HEAD and 1_1_STABLE... > >> >> > >> >> Let me suggest that you create an extra index: > >> >> > >> >> create index sl_log_1_idx2 on _pl_replication.sl_log_1(log_xid > >> >> _pl_replication.xxid_ops); > >> > > >> > > >> > Is that an optimization problem introduced by using BETWEEN instead > > of > >> > the excessive OR chains? > >> > >> No, the BETWEEN patch is not in CVS yet, so nobody would be using it > > by > >> accident :-). > >> > >> The extra index has proven useful with both version 1.0.5 and 1.1.0. > >> > > Unfortunately it doesn't appear that it helped me out any :-/ > > > > pl=# \d _pl_replication.sl_log_1 > > Table "_pl_replication.sl_log_1" > > Column | Type | Modifiers > > ---------------+----------------------+----------- > > log_origin | integer | > > log_xid | _pl_replication.xxid | > > log_tableid | integer | > > log_actionseq | bigint | > > log_cmdtype | character(1) | > > log_cmddata | text | > > Indexes: > > "sl_log_1_idx1" btree (log_origin, log_xid _pl_replication.xxid_ops, > > log_actionseq) > > "sl_log_1_idx2" btree (log_xid _pl_replication.xxid_ops) > > > > > > > > 2005-09-27 14:39:06 CDT DEBUG2 remoteHelperThread_1_1: 119.242 seconds > > delay for first row > > 2005-09-27 14:39:06 CDT DEBUG2 remoteHelperThread_1_1: 119.243 seconds > > until close cursor > > 2005-09-27 14:39:06 CDT DEBUG2 remoteWorkerThread_1: new sl_rowid_seq > > value: 1000000000000000 > > 2005-09-27 14:39:06 CDT DEBUG2 remoteWorkerThread_1: SYNC 1610644 done > > in 119.289 seconds > > > > I did also make the changes to the -s and -g parameters and it appears > > to be generating syncs less often, but it doesn't really look to be > > catching up any faster. > > > > postgres 12839 0.0 0.6 288840 51708 pts/3 S 14:48 > > 0:00 /usr/local/pgsql/bin//slon -s 60000 -d2 -g80 pl_replication > > host=192.168.1.200 dbname=pl > > > > > > > > Any other changes you can suggest? > > Did you give it an ANALYZE after creating the index? > > > Jan > Yep. And I restarted the slon's as well, so they should be using the index as long as the planner thinks it will help. Is there anyway I can see the query that slon is generating to create the cursor? If I could see that query then I could do some explain analyze's and see what they say... -Joe > > > > > _______________________________________________ > > Slony1-general mailing list > > Slony1-general at gborg.postgresql.org > > http://gborg.postgresql.org/mailman/listinfo/slony1-general > >
- Previous message: [Slony1-general] Slony Falling Behind
- Next message: [Slony1-general] Slony Falling Behind
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list