Tue Aug 21 15:16:29 PDT 2012
- Previous message: [Slony1-general] Monitoring big table sync
- Next message: [Slony1-general] Monitoring big table sync
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, Aug 21, 2012 at 5:41 PM, CS DBA <cs_dba at consistentstate.com> wrote: > Hi all; > > I have an initial sync in progress. It's currently syncing a 32GB table. I'd like to be able to verify that some progress is happening. Here's what I see: > > > log1 (the master slon log) has this: > > NOTICE: Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated > CONTEXT: PL/pgSQL function "cleanupevent" line 99 at assignment > 2012-08-21 14:40:26 MDTINFO cleanupThread: 0.043 seconds for cleanupEvent() > NOTICE: Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated > CONTEXT: PL/pgSQL function "cleanupevent" line 99 at assignment > 2012-08-21 14:51:48 MDTINFO cleanupThread: 0.054 seconds for cleanupEvent() > 2012-08-21 14:51:48 MDTINFO cleanupThread: 0.006 seconds for vacuuming > NOTICE: Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated > CONTEXT: PL/pgSQL function "cleanupevent" line 99 at assignment > 2012-08-21 15:03:06 MDTINFO cleanupThread: 0.043 seconds for cleanupEvent() > > > log2 (the slave slon log) has this: > $ tail slon.node2.log > 2012-08-21 04:24:05 MDTCONFIG remoteWorkerThread_1: 92441 bytes copied for table "sch1"."rtl_pkg" > 2012-08-21 04:24:05 MDTCONFIG remoteWorkerThread_1: 0.377 seconds to copy table "sch1"."rtl_pkg" > 2012-08-21 04:24:05 MDTCONFIG remoteWorkerThread_1: copy table "sch1"."std_contr" > 2012-08-21 04:24:05 MDTCONFIG remoteWorkerThread_1: Begin COPY of table "sch1"."std_contr" > NOTICE: truncate of "sch1"."std_contr" succeeded > 2012-08-21 04:24:05 MDTCONFIG remoteWorkerThread_1: 35072 bytes copied for table "sch1"."std_contr" > 2012-08-21 04:24:05 MDTCONFIG remoteWorkerThread_1: 0.345 seconds to copy table "sch1"."std_contr" > 2012-08-21 04:24:05 MDTCONFIG remoteWorkerThread_1: copy table "sch2"."order_hist" > 2012-08-21 04:24:05 MDTCONFIG remoteWorkerThread_1: Begin COPY of table "sch2"."order_hist" > NOTICE: truncate of "sch2"."order_hist" succeeded > > > The system time is now : > Tue Aug 21 15:24:30 MDT 2012 > > So the sch2.order_hist table has been syncing for 11 hours. The sch2.order_hist table is 32GB and I did expect it to take some time but I would like to be able to validate some progress. A select * from pg_stat_activity on the master shows me that this sql is running: > > copy "sch1"."order_hist" ("id","order_hist_id","cust_name","order_id","order_type", "order_dt","sw_version") to stdout; > > > And, a select * from pg_stat_activity on the slave shows me that this sql is running: > select "_custdb_slony".prepareTableForCopy(186); copy "sch2"."order_hist" ("id","order_hist_id","cust_name","order_id","order_type", "order_dt","sw_version") from stdin; > > > Other than this I see no "verifiable" activity, plus the lag_events in sl_status on the master continue to grow, currently we're at 57,355 lag events. > > > Thoughts? That is exactly what I'd expect to see if things are working properly. The one thing that I'd be a *bit* concerned about is the 57K event lag. At one point, I recall there being a possible troublesome case where one of the queries for the first SYNC processing after that could time out when trying to read in all of the tens of thousands of events. It would be nicer if, during this laggy period, SYNCs got generated somewhat less often, let's say, every 10-20s rather than every 0.5s. There's not much benefit to having super-fine granularity if you are catching up on 40 hours of backlog. Fine granularity of SYNCs might be of benefit in keeping replicas more up to date once subscribed, but doesn't help one bit during the 40 hour backlog. You could increase the parm "sync_interval" for the slon process managing the origin node, perhaps changing it to 1000 or even 10000, to have less frequent SYNCs generated on that node. And you can do this without interrupting the subscription; the slon process that is copying data to the subscriber is the one managing the subscriber node. Bad mojo to kill *that* slon - that would restart the subscription process! :-( But it should be fine to kill and restart the slon managing the origin node with the new parameter value. If you're most of the way through, there's little point. Thinking aloud, I wonder if we might want to have this parameter be more "adaptive", so that we'd lengthen sync_interval automatically if all subscriptions are way behind. But that's not of any use to you, today. There's one more thing you could do on the replica, which is to run "vacuum verbose" against the 32GB table. The output of that should give you an idea as to how far thru the process it is. Mind you, it kicks off a re-index of all indexes on the table after the COPY, so it could be that the data's all in place, and it's busy indexing. Note that this re-index is a performance WIN; we shut off indexes and reindex at the end because mass-reindexing is quite a bit faster than loading data into the table with indexes turned on.
- Previous message: [Slony1-general] Monitoring big table sync
- Next message: [Slony1-general] Monitoring big table sync
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list